home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Ian & Stuart's Australian Mac: Not for Sale
/
Another.not.for.sale (Australia).iso
/
hold me in your arms
/
Galactic Guide
/
text.format
< prev
next >
Wrap
Text File
|
1994-03-13
|
25KB
|
567 lines
Newly Revised and Updated Formatting Standard for Project Galactic Guide
Revised 19930420 by Paul Clegg, with lots of information supplied by
Stephane Lussier, Tobias B Koehler, and everyone on alt.galactic-guide
Introduction:
The point of all this is to have a very, very extensive reference for
programmers and editors to create and maintain the data archives for
Project Galactic Guide. The reason this extensive formatting design
is necessary is because the Guide will be (and already has been) ported
to various computer architectures, and not all computers use the same
character sets, or can handle the same type of information. In particular,
the Unix systems that most of us use can only handle 7-bit ASCII for
mailings, news posts, etc, so we are constrained to use the worst
possible character set for our data.
This does not mean that we cannot represent alternate character sets.
This was the primary reason for updating the design into an extremely
complex standard in the first place. The purpose has since expanded
to include various text effects, margin control, etc, that is or might
be needed to properly portray specific articles.
This text here should not intimidate field researchers in any way.
Articles will be accepted in raw ASCII format, hand-written hardcopy,
or even in text printed with a word processor package. The editors
would like to encourage field researchers to use the following
standard, to lighten their workload, but the hierarchy here does not
at all require field researchers to use this format for their
submissions.
With that aside, I now cast you into the world of 7-bit data
representations...
Special Characters:
This section details all the special characters that might be used in
any given article. Accompanying the name of the character is the
code, 7-bit replacement (if there is no better replacement in any
given character set), and numerical codes for several popular
character sets. Most of the information contained within this section
has been derived from Tobias B. Koehler's posting to alt.galactic-guide.
Definitions of accents:
breve accent: \_/ (above letter)
acute accent: / (above letter)
grave accent: \ (above letter)
circumflex: /\ (above letter)
hacek accent: \/ (above letter)
tilde: ~ (above letter)
two dots: .. (above letter)
ring: o (above letter)
two acute acc: // (above letter)
dot: . (above letter)
cedilla: _) (under letter)
ogonek hook: (_ (under letter)
Special letters: Eth and Thorn are special Icelandic characters. The
uppercase Eth looks like a slashed D, the lowercase eth looks like a
horizontally flipped 6 with a slash. The uppercase Thorn looks like
the upper half of a b combined with the lower half of a p. The long s
looks like the f without the horizontal bar; the sharp s is a ligature
of a long s and a normal s. Both are German thingies.
code: Textual code
repl: 7-bit replace to be used if character not available
EC: TeX Extended Computer Modern character set code
ISO: ISO 8859/1 (Amiga, Windows) character set code
850: IBM codepage 850 (MS-DOS, OS/2) character set code
Most important: To represent a backslash (which is normally an escape
character to denote a special character or effect) use a double backslash:
\\ inserts a single \ character.
code |repl|description |position |
| | |EC |ISO |850 |
\ch`` " Eng dbl left/Ger dbl right quote 16 147
\ch'' " English double right quote 17 148
\ch,, " German double left quote 18 132
\ch<< " French double left quote 19 171 174
\ch>> " French double right quote 20 187 175
\ch < ` French single left quote 14 139
\ch > ' French single right quote 15 152
\ch-- -- long dash (as opposed to hyphen) 22 151 196
\ch r d degree sign 6 176 248
\ch$$ $ paragraph or section sign 159 167 245
\%o o/oo promille sign 37+24 137
\chOC (C) copyright sign 169 184
\chOR (R) registered trademark sign 174 169
\ch=L L pound sterling sign 191 163 156
\chuA A A with breve accent 128
\ch;A A A with ogonek hook 129
\ch`A A A with grave accent 192 192 183
\ch'A A A with acute accent 193 193 181
\ch^A A A with circumflex 194 194 182
\ch~A A A with tilde 195 195 199
\ch"A Ae A with two dots 196 196 142
\chrA Aa A with ring (ala Angstrom) 197 197 143
\chAE AE AE ligature 198 198 146
\chua a a with breve accent 160
\ch;a a a with ogonek hook 161
\ch`a a a with grave accent 224 224 133
\ch'a a a with acute accent 225 225 160
\ch^a a a with circumflex 226 226 131
\ch~a a a with tilde 227 227 198
\ch"a ae a with two dots 228 228 132
\chra aa a with ring 229 229 134
\chae ae ae ligature 230 230 145
\ch'C C C with acute accent 130
\chvC C C with hacek accent 131
\ch,C C C with cedilla 199 199 128
\ch'c c c with acute accent 162
\chvc c c with hacek accent 163
\ch,c c c with cedilla 231 231 135
\chvD D D with hacek accent 132
\ch-D D slashed D or Eth (\chEt) 208 208 209
\ch-d d slashed d 158
\chet eth (\chet) 240 240 208
\chvE E E with hacek accent 133
\ch;E E E with ogonek hook 134
\ch`E E E with grave accent 200 200 212
\ch'E E E with acute accent 201 201 144
\ch^E E E with circumflex 202 202 210
\ch"E E E with two dots 203 203 211
\chve e e with hacek accent 165
\ch;e e e with ogonek hook 166
\ch`e e e with grave accent 232 234 138
\ch'e e e with acute accent 233 234 130
\ch^e e e with circumflex 234 234 136
\ch"e e e with two dots 235 235 137
\chuG G G with breve accent 135
\chug g g with breve accent 167
\ch.I I I with dot 157
\ch`I I I with grave accent 204 204 222
\ch'I I I with acute accent 205 205 161
\ch^I I I with circumflex 206 206 215
\ch"I I I with two dots 207 207 216
\ch i i dotless i 25 213
\ch`i i i with grave accent 236 236 141
\ch'i i i with acute accent 237 237 161
\ch^i i i with circumflex 238 238 140
\ch"i i i with two dots 239 239 139
\ch j j dotless j 26
\ch'L L L with acute accent 27
\ch-L L slashed L 138
\ch'l l l with acute accent 168
\ch-l l slashed l 169
\ch'N N N with acute accent 139
\chvN N N with hacek accent 140
\chNJ Nj NJ ligature 141
\ch~N N N with tilde 209 209 165
\ch'n n n with acute accent 170
\chvn n n with hacek accent 171
\chnj nj nj ligature 173
\ch~n n n with tilde 241 241 164
\chhO Oe O with two acute accents 142
\ch`O O O with grave accent 210 210 227
\ch'O O O with acute accent 211 211 224
\ch^O O O with circumflex 212 212 226
\ch~O O O with tilde 213 213 229
\ch"O Oe O with two dots 153
\chOE OE OE ligature 215 140
\ch/O Oe slashed O 216 216 157
\chho oe o with two acute accents 174
\ch`o o o with grave accent 242 242 149
\ch'o o o with acute accent 243 243 162
\ch^o o o with circumflex 244 244 147
\ch~o o o with tilde 245 245 228
\ch"o oe o with two dots 148
\choe oe oe ligature 247 156
\ch/o oe slashed o 248 248 155
\ch'R R R with acute accent 143
\chvR R R with hacek accent 144
\ch'r r r with acute accent 175
\chvr r r with hacek accent 176
\ch'S S S with acute accent 145
\chvS S S with hacek accent 146 138
\ch,S S S with cedilla 147
\ch's s s with acute accent 177
\chvs s s with hacek accent 178 154
\ch,s s s with cedilla 179
\chss ss sharp s 255 223 225
\chls s long s
\chvT T T with hacek accent 148
\ch,T T T with cedilla 149
\chTh Thorn 222 222 232
\ch,t t t with cedilla 181
\chth thorn 254 254 231
\chhU UE U with two acute accents 150
\chrU U U with ring 151
\ch`U U U with grave accent 217 217 235
\ch'U U U with acute accent 218 218 233
\ch^U U U with circumflex 219 219 234
\ch"U Ue U with two dots 220 220 154
\ch.U U U with dot
\chhu ue u with two acute accents 182
\chru u u with ring 183
\ch`u u u with grave accent 249 249 151
\ch'u u u with acute accent 250 250 163
\ch^u u u with circumflex 251 251 150
\ch"u ue u with two dots 252 252 129
\ch.u u u with dot
\ch"Y Y Y with two dots 152
\ch'Y Y Y with acute accent 221 221 237
\ch"y y y with two dots 184 152
\ch'y y y with acute accent 253 253 236
\ch'Z Z Z with acute accent 153
\chvZ Z Z with hacek accent 154
\ch.Z Z Z with dot 155
\ch'z z z with acute accent 185
\chvz z z with hacek accent 186
\ch.z z z with dot 187
NOTE: The following information was mostly picked out of one of Stephane
Lussier's numerous informative posts. The following are REALLY special
characters that are usually only used in special circumstances, such as
mathematical texts. I do not have the resources to research the characters
in the various character sets, so in this case, the character code is
followed by the 7-bit ASCII representation and a short explanation.
Greek Characters:
code |repl|description
\Galp a lower case alpha
\GALP A upper case alpha
\Gbet b lowercase beta
\GBET B uppercase beta
\Ggam g lowercase gamma
\GGAM G uppercase gamma
\Gdel d lowercase delta
\GDEL D uppercase delta
\Geps e lowercase epsilon
\GEPS E uppercase epsilon
\Gzet z lowercase zeta
\GZET Z uppercase zeta
\Geta h lowercase eta
\GETA H uppercase eta
\Gthe o lowercase theta
\GTHE O uppercase theta
\Giot i lowercase iota
\GIOT I uppercase iota
\Gkap k lowercase kappa
\GKAP K uppercase kappa
\Glam l lowercase lambda
\GLAM L uppercase lambda
\G*mu m lowercase mu
\G*MU M uppercase mu
\G*nu n lowercase nu
\G*NU N uppercase nu
\G*xi x lowercase xi
\G*XI X uppercase xi
\Gomi o lowercase omicron
\GOMI O uppercase omicron
\G*pi pi lowercase pi
\G*PI PI uppercase pi
\Grho p lowercase rho
\GRHO P uppercase rho
\Gsig s lowercase sigma
\GSIG S uppercase sigma
\Gtau t lowercase tau
\GTAU T uppercase tau
\Gups u lowercase upsilon
\GUPS U uppercase upsilon
\Gphi o lowercase phi
\GPHI O uppercase phi
\Gchi x lowercase chi
\GCHI X uppercase chi
\Gpsi y lowercase psi
\GPSI Y uppercase psi
\Gome w lowercase omega
\GOME W uppercase omega
Note: Some 7-bit representations have been duplicated. From a programming
standpoint, it's probably preferred to actually replace the symbol with its
full name (sans upper/lowercase), since the 7-bit letters don't fully
coincide with the real characters too much.
Mathematical Characters:
code |repl|description
\M**8 oo infinity
\M*+- +- plus over minus
\MNOT - negation character (horizontal bar w/ short vertical bar on left)
\M*lv V logic: OR
\M(+) (+) logic: XOR (Exclusive OR)
\M(/) 0 empty set notation
\M*|^ v logic: NOR (down arrow type of thing)
\M--> --> implication
\M-/> -/-> "does not imply"
\M<-- <-- implication
\M</- <-/- "does not imply"
\M<-> <--> double implication
\M</> <-/-> "there is no double implication"
\M==> ==> implication
\M=/> =/=> "does not imply"
\M<== <== implication
\M</= <=/= "does not imply"
\M<=> <==> equivalence
\M</> <=/=> "there is no equivalence"
\M*-= = congruence (three horizontal bars)
\M/-= != not congruent
\M*/= != not equal (slashed equal sign)
\M**~ ~ is equivalent to
\M*~- ~- isomorphism (tilde over single bar)
\M*~~ ~= approximately equals (two stacked wavy lines)
\M*~= = wavy line over equal sign
\M*)( asymptotal (upcurve over downcurve)
\M*|| || two parallel lines
\M*rA upturned A, "for all"
\M*rE reversed E, "there exists"
\M/rE slashed reversed E, "there does not exist"
\M*.: three dots in triangle, "therefore"
\M**U U union
\M*rU intersection (overturned U)
\M**E "is an element of"
\M*/E "is not an element of"
\M**C C "is a subset of"
\M*/C !C "is not a subset of"
\M**X X Cartesian product sign
\M**| | Full vertical bar for absolute values, etc.
\M*/| !| Does not divide (vertical bar w/ slash)
\M**o o Composition (small circle)
\M**. * Product (small point)
\M**> Derivable, right pointing hollow triangle
\M**< Normal subgroup notation, left pointing hollow triangle
\M**% Division sign (circle over and below horizontal line)
\M*>= >= Greater than or equal to
\M/>= !>= Not greater than or equal to
\M*<= <= Less than or equal to
\M/<= !<= Not less than or equal to
\Mint Integration sign
\Mont Integration sign with small circle on it
\M**' ' Prime
\M**" " Double prime
\M*'" '" Triple prime (etc. up to \M""", sextuple prime)
Formatting Effects:
The following sections include various special text effects and devices to
allow various platforms to display various things in special formats. Since
monospaced ASCII has been shown to not work very well, particularly with
varying display widths, it is impossible to relegate text formatting to the
ASCII dump. Many of the ideas within this section have been taken straight
from Stephane Lussier's post(s), though everyone's posts have influenced the
end result you see here.
Text Effects:
Text effects are things such as bold, italic, superscript, subscript,
underline, and other visual effects that may be applied to text to make
it more visually appealing, clear, and informative.
All format controls are denoted by a backslash, a code (usually four
letters), and a left curly brace ("{"). These sections are terminated by a
right curly brace ("}"). The text to be that should have the given effect
should be inside the two curly braces. Because there may easily be a reason
to have a right curly brace in the text, a right curly brace is denoted as
\}, to indicate that it is not part of the text coding. There is no reason
for an alternate marker for left curly braces.
Bold: \bold{ <text> }
Italic: \ital{ <text> }
Underlined: \undl{ <text> }
Double Underlined: \dund{ <text> }
Subdued: \subd{ <text> }
Flashing: \flsh{ <text> }
Subscript: \subs{ <text> }
Superscript: \sups{ <text> }
Effects primarily used in mathematics:
Overlined: \ovrl{ <text> }
Right Arrow Over Expression (vector): \raro{ <text> }
Left Arrow Over Expression: \laro{ <text> }
Hat Over Expression: \mhat{ <text> } Note: <text> here must be a single
character.
NOTE: Very intricate mathematical formatting instructions may eventually
be included in this standard, but they are not being included in this
version. For programmers writing code, assume that if you come across
the \MATH{ <text> } escape code sequence, ignore it all. This will allow
reader programs written to this format to be able to handle the only major
expansion to this format that I forsee in the future, or at least not barf
if it comes across an article with the expanded math features.
Addendum: You WILL have to check to make sure all the curly braces are
matched within the \MATH structure, in order to figure out when the \MATH
structure ends. Within MATH structures, \{ and \} indicate curly braces
with no escape codes attached (and thus don't affect the stack of braces).
Standard Structure:
The body of every article is organized into sections. For instance, should
this become an entry, this paragraph is considered a section. A table would
have to be used for the character codes above, and that would be another
section. In this case even the subtitles (such as "Standard Structure:")
would be separate sections. Whether or not sections should be separated
by blank lines is optional, and may be left to a user-defined option, or
programmer's choice; the ruling is not made here.
Text formatting codes (such as underline, etc. as listed above) should be
reset to default in between sections. If a text style is to be continued
into the next section, the proper codes must be re-applied within the
section's curly braces.
Paragraphs:
The type of section that should be most common would be the standard
paragraph. A paragraph is denoted by \para{ followed by all the text that
should go into that paragraph. The paragraph must be terminated by an
ending }. Escape codes are allowed in paragraphs provided they are not
section codes. You cannot embed paragraphs inside other paragraphs, nor
can you embed matrices, lists, etc. within paragraphs. An example paragraph:
\para{This is an example paragraph. Other than the initial escape code, and
the ending curly brace, and any required escape codes within this text, this
text should be completely \bold{ASCII}. For electronic mail transmission
purposes, the length of a line should not be more than 78 characters in
width, and lines of less than 76 characters is appreciated. Because the end
of a paragraph is only when a \} is found, the reader programs can wrap text
on their own, and so the EOL can be relatively ignored. Do \bold{NOT}
hyphenate words.}
Individual Lines:
Often individual lines are wanted or required, particularly for things such
as subsection headers, and so on. An individual line is still considered
a section, and as such should leave a blank line after it. However single
lines are much more flexible than paragraphs in most respects, and there are
actually several types of individual lines that may be employed in an
article.
Justification: Single lines may be justified in any one of three ways:
left, right, and center. The codes for this are, respectively, \jstl{ },
\jstr{ }, and \cntr{ }.
Preformat: A single line may be dictated as being preformatted, or absolute,
where the reader should accept the text as being formatted for an 75 column
display and should not try to "play" with the text involved. This is
included only for those rare problems, and should not be used if at all
possible. The escape code is \PREF{ <text> }. Textual effects may still
be applied to the text contained in a preformatted line, but spacing should
not be toyed with by the reader program.
Special Effects: A single line allows us some freedom in other ways, too.
Inserting a \. into a single line inserts a line feed, such that the text
should drop to the same column, but the next row. This may be accomplished
almost as easily, if not more easily, by simply using several preformat
commands.
Internal Passages:
Long quotes should be given special cases, being different from a standard
paragraph. Text enclosed in the \quot{} formatting code should be treated
as a normal paragraph, but it should be indented on both sides when
displayed. For an 80 column text screen, a five space indent on both sides
is suggested.
Lists:
Lists are obviously used for lists of information, which may of any number
of things. The list command, however, also works for outline designs, which
is basically a specialized list design. There are several types of lists,
and all of them may be nested within each other, with the one exception of
the military notation list (see below). In any case, an element in a list
should be offset from the left margin by some number of characters; for an
80 column display, the suggested indent space is 10 characters. Text that
wraps around a display should be indented so as to line up with the first
character of the actual text, and not just with the first digit of the
element identifier. Sublists, or lists embedded in other lists, should
be indented again. For all lists, the list type is used only to determine
the type of list. Each element in the list must be contained in a \item{}
field.
Arabic Number List: This is your basic list, with elements numbered 1, 2,
3, etc. The escape code for this type of list is \LSAx { ... }, where
x is the character that follows the number (see below).
Lowercase Letter List: This uses the alphabet to denote its elements. The
first element will be marked with by "a", the next by "b", etc. There
may NOT be more than 26 elements in a letter list. The escape code is
\LSlx { ... }.
Uppercase Letter List: Exactly like the \LSlx { ... } list type, but
using uppercase letters instead. The escape code is \LSLx { ... }, and
it too is restricted to 26 or fewer elements.
Lowercase Roman List: Uses lowercase Roman numerals, i, ii, iii, iv, etc.
The escape code is \LSrx { ... }.
Uppercase Roman List: Uses uppercase Roman numeral, I, II, III, IV, etc.
The escape code is \LSRx { ... }.
No Identifier List: This does not use any number or character to
differentiate between elements. The escape code is \LS_x { ... }, which
allows the author to still use special characters listed below to mark
elements.
Military Notation List: This is a tricky one. Only Military Notation
Lists may be nested within Military Notation Lists. The identifying
numbers are in Arabic numerals (ie. decimal), but also show the hierarchy
of the list itself. The reader program must run through the list and
determine how deep the sublists embedded in the list go, as each number
must be expanded to show this. Thus, if you have a list that has a sublist
inside it, and that sublist has yet another sublist, the numbers must
expanded to three places, so the very first element would be 1.0.0, the
second element would be 2.0.0, etc., but the sublist off the first element
would have 1.1.0 for the first element. The first element off the first
sublist of the first sublist would be 1.1.1. If sublists nested in a list
five deep, the very first number would be 1.0.0.0.0, but if they nested
only two deep, the first number would be 1.0. The escape code for this
type of list is \LSMN { ... }.
Separator Characters: With the exception of the Military Notation List,
all the lists have one space in their command for a single character.
This character must be chosen off the following list:
. Uses a period after the list identifier.
, Uses a comma after the list identifier.
: Uses a colon after the list identifier.
- Uses a dash after the list identifier.
) Uses a right parenthesis after the list identifier
_ Puts nothing after the list identifier.
> Puts an arrow after the list identifier.
* Puts a bullet after the list identifier.
Matrices:
There have been several suggestions for matrices, but I have yet to figure
out yet how exactly to implement them. A matrix will be given the escape
code \MTRX { ... }, so until a matrix standard is produced, ignore the
matrices.
Conclusion:
This is the first Really Big Galactic Guide Format in the Guide's history.
Undoubtedly, there are many problems with what I've put together here, and
I've almost certainly left things out. But that's what revisions are all
about. With this standard, however, the use of escape codes allows for
future expansion very easily, and any revisions will most likely not be
of such a large scale. I want to take this time here to thank everyone
who actually put more than thirty seconds of thought into this project,
and especially everyone who stuck with the project from the very beginning.
And a really big hand to all the programmers who've created Guide readers,
cuz they're really going to be pissed when they try to program for this
monstrosity!
...Paul